{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b3Y3DOVqtIbc"
      },
      "source": [
        "# Example - Improve Retrievers using Rerankers & Hybrid search\n",
        "\n",
        "## Optimizing RAG retrieval performance using hybrid search & reranking"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6gUUIxGP0n1Z",
        "outputId": "0319735d-5986-470b-ad7a-3e6a9a4032f6"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m177.4/177.4 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m139.2/139.2 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m16.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m10.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.4/12.4 MB\u001b[0m \u001b[31m51.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m82.7/82.7 kB\u001b[0m \u001b[31m12.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m11.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ],
      "source": [
        "!pip install lancedb sentence-transformers cohere tantivy pyarrow==13.0.0 -q"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DQSVI4GSjU0b"
      },
      "source": [
        "## What is a retriever\n",
        "VectorDBs are used as retrievers in recommender or chatbot-based systems for retrieving relevant data based on user queries. For example, retriever is a critical component of Retrieval Augmented Generation (RAG) acrhitectures. In this section, we will discuss how to improve the performance of retrievers.\n",
        "\n",
        "<img src=\"https://llmstack.ai/assets/images/rag-f517f1f834bdbb94a87765e0edd40ff2.png\" />\n",
        "\n",
        "[source](https://llmstack.ai/assets/images/rag-f517f1f834bdbb94a87765e0edd40ff2.png)\n",
        "\n",
        "## How do you go about improving retreival performance\n",
        "Some of the common techniques are:\n",
        "\n",
        "- Using different search types - vector/semantic, FTS (BM25)\n",
        "- Hybrid search\n",
        "- Reranking\n",
        "- Fine-tuning the embedding models\n",
        "- Using different embedding models\n",
        "\n",
        "Obviously, the above list is not exhaustive. There are other subtler ways that can improve retrieval performance like alternative chunking algorithms, using different distance/similarity metrics, and more. For brevity, we'll only cover high level and more impactful techniques here.\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3ZCm3-Bog9g7"
      },
      "source": [
        "# LanceDB\n",
        "- Multimodal DB for AI\n",
        "- Powered by an innovative & open-source in-house file format\n",
        "- Zero setup\n",
        "- Scales up on disk storage\n",
        "- Native support for vector, full-text(BM25) and hybrid search\n",
        "\n",
        "<img src=\"https://lancedb.github.io/lancedb/assets/lancedb_and_lance.png\"\n",
        "style=\"margin:auto\" />\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b1fzhbQc4O1u"
      },
      "source": [
        "## The dataset\n",
        "The dataset we'll use is a synthetic QA dataset generated from LLama2 review paper. The paper was divided into chunks, with each chunk being a unique context. An LLM was prompted to ask questions relevant to the context for testing a retriever.\n",
        "The exact code and other utility functions for this can be found in [this](https://github.com/lancedb/ragged) repo.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "f_qnH-Dfhi9Z",
        "outputId": "1e22e1b1-a821-4ccb-ff30-1b2d6f8b824e"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2024-07-24 14:22:47--  https://raw.githubusercontent.com/AyushExel/assets/main/data_qa.csv\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 680439 (664K) [text/plain]\n",
            "Saving to: ‘data_qa.csv’\n",
            "\n",
            "data_qa.csv         100%[===================>] 664.49K  --.-KB/s    in 0.03s   \n",
            "\n",
            "2024-07-24 14:22:48 (19.9 MB/s) - ‘data_qa.csv’ saved [680439/680439]\n",
            "\n"
          ]
        }
      ],
      "source": [
        "!wget https://raw.githubusercontent.com/AyushExel/assets/main/data_qa.csv"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "ZNNAUc6f7ILI"
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "\n",
        "data = pd.read_csv(\"data_qa.csv\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 580
        },
        "id": "4Bp9Fdhz7QsM",
        "outputId": "fdcbc090-d526-4dcb-98a2-c0d8090f295d"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "     Unnamed: 0                                              query  \\\n",
              "0             0  How does the performance of Llama 2-Chat model...   \n",
              "1             1  What benefits does the enhancement and safety ...   \n",
              "2             2  How does one ensure the reliability and robust...   \n",
              "3             3  What methodologies are employed to align machi...   \n",
              "4             4  What are some of the primary insights gained f...   \n",
              "..          ...                                                ...   \n",
              "215         215  How are the terms 'clean', 'not clean', 'dirty...   \n",
              "216         216  How does the size of the model influence the a...   \n",
              "217         217  What impact does the model contamination have ...   \n",
              "218         218  What are the different sizes and types availab...   \n",
              "219         219  Could you discuss the sustainability measures ...   \n",
              "\n",
              "                                               context  \\\n",
              "0    Llama 2 : Open Foundation and Fine-Tuned Chat ...   \n",
              "1    Llama 2 : Open Foundation and Fine-Tuned Chat ...   \n",
              "2    Contents\\n1 Introduction 3\\n2 Pretraining 5\\n2...   \n",
              "3    Contents\\n1 Introduction 3\\n2 Pretraining 5\\n2...   \n",
              "4    . . . . . . . . 23\\n4.3 Red Teaming . . . . . ...   \n",
              "..                                                 ...   \n",
              "215  Giventhe\\nembarrassinglyparallelnatureofthetas...   \n",
              "216  Dataset Model Subset Type Avg. Contam. % n ¯X ...   \n",
              "217  Dataset Model Subset Type Avg. Contam. % n ¯X ...   \n",
              "218  A.7 Model Card\\nTable 52 presents a model card...   \n",
              "219  A.7 Model Card\\nTable 52 presents a model card...   \n",
              "\n",
              "                                                answer  \n",
              "0    Llama 2-Chat models have shown to exceed the p...  \n",
              "1    The safety and enhancement measures implemente...  \n",
              "2    In the initial steps of model development, the...  \n",
              "3    Machine learning models can be aligned with de...  \n",
              "4    The key insights gained from evaluating platfo...  \n",
              "..                                                 ...  \n",
              "215  In the discussed dataset analysis, samples are...  \n",
              "216  The size of the model significantly influences...  \n",
              "217  Model contamination affects various contaminat...  \n",
              "218  Llama 2 is available in three distinct paramet...  \n",
              "219  Throughout the training of Llama 2, which invo...  \n",
              "\n",
              "[220 rows x 4 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-7f0cc1a4-3f03-452b-a274-5569309539c0\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>query</th>\n",
              "      <th>context</th>\n",
              "      <th>answer</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>How does the performance of Llama 2-Chat model...</td>\n",
              "      <td>Llama 2 : Open Foundation and Fine-Tuned Chat ...</td>\n",
              "      <td>Llama 2-Chat models have shown to exceed the p...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>What benefits does the enhancement and safety ...</td>\n",
              "      <td>Llama 2 : Open Foundation and Fine-Tuned Chat ...</td>\n",
              "      <td>The safety and enhancement measures implemente...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2</td>\n",
              "      <td>How does one ensure the reliability and robust...</td>\n",
              "      <td>Contents\\n1 Introduction 3\\n2 Pretraining 5\\n2...</td>\n",
              "      <td>In the initial steps of model development, the...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>3</td>\n",
              "      <td>What methodologies are employed to align machi...</td>\n",
              "      <td>Contents\\n1 Introduction 3\\n2 Pretraining 5\\n2...</td>\n",
              "      <td>Machine learning models can be aligned with de...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>4</td>\n",
              "      <td>What are some of the primary insights gained f...</td>\n",
              "      <td>. . . . . . . . 23\\n4.3 Red Teaming . . . . . ...</td>\n",
              "      <td>The key insights gained from evaluating platfo...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>215</th>\n",
              "      <td>215</td>\n",
              "      <td>How are the terms 'clean', 'not clean', 'dirty...</td>\n",
              "      <td>Giventhe\\nembarrassinglyparallelnatureofthetas...</td>\n",
              "      <td>In the discussed dataset analysis, samples are...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>216</th>\n",
              "      <td>216</td>\n",
              "      <td>How does the size of the model influence the a...</td>\n",
              "      <td>Dataset Model Subset Type Avg. Contam. % n ¯X ...</td>\n",
              "      <td>The size of the model significantly influences...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>217</th>\n",
              "      <td>217</td>\n",
              "      <td>What impact does the model contamination have ...</td>\n",
              "      <td>Dataset Model Subset Type Avg. Contam. % n ¯X ...</td>\n",
              "      <td>Model contamination affects various contaminat...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>218</th>\n",
              "      <td>218</td>\n",
              "      <td>What are the different sizes and types availab...</td>\n",
              "      <td>A.7 Model Card\\nTable 52 presents a model card...</td>\n",
              "      <td>Llama 2 is available in three distinct paramet...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>219</th>\n",
              "      <td>219</td>\n",
              "      <td>Could you discuss the sustainability measures ...</td>\n",
              "      <td>A.7 Model Card\\nTable 52 presents a model card...</td>\n",
              "      <td>Throughout the training of Llama 2, which invo...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>220 rows × 4 columns</p>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-7f0cc1a4-3f03-452b-a274-5569309539c0')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-7f0cc1a4-3f03-452b-a274-5569309539c0 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-7f0cc1a4-3f03-452b-a274-5569309539c0');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-dfb13a8d-ae02-4de8-bb5e-8000c749f494\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-dfb13a8d-ae02-4de8-bb5e-8000c749f494')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-dfb13a8d-ae02-4de8-bb5e-8000c749f494 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "  <div id=\"id_9ea37551-f803-4082-aeab-8b7746977268\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('data')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_9ea37551-f803-4082-aeab-8b7746977268 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('data');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "data",
              "summary": "{\n  \"name\": \"data\",\n  \"rows\": 220,\n  \"fields\": [\n    {\n      \"column\": \"Unnamed: 0\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 63,\n        \"min\": 0,\n        \"max\": 219,\n        \"num_unique_values\": 220,\n        \"samples\": [\n          132,\n          148,\n          93\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"query\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 220,\n        \"samples\": [\n          \"What type of examination did scholars perform on ChatGPT, and when was the resulting scholarly paper published?\",\n          \"How do the performance capabilities of the different models compare in evaluating tasks associated with logical reasoning and reading comprehension, specifically noted in tests like LSAT and SAT?\",\n          \"What steps are recommended for users to ensure the responsible use of AI models like Llama 2 in projects or commercial applications?\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"context\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 110,\n        \"samples\": [\n          \"Dialogue Turn Baseline + GAtt\\n2 100% 100%\\n4 10% 100%\\n6 0% 100%\\n20 0% 100%\\nTable30: GAttresults. Llama 2-Chat withGAttisabletorefertoattributes100%ofthetime,forupto20\\nturns from our human evaluation. We limited the evaluated attributes to public figures and hobbies.\\nTheattentionnowspansbeyond20turns. Wetestedthemodelabilitytorememberthesystemarguments\\ntroughahumanevaluation. Thearguments(e.g. hobbies,persona)aredefinedduringthefirstmessage,and\\nthen from turn 2 to 20. We explicitly asked the model to refer to them (e.g. \\u201cWhat is your favorite hobby?\\u201d,\\n\\u201cWhatisyourname?\\u201d),tomeasurethemulti-turnmemoryabilityof Llama 2-Chat . Wereporttheresults\\ninTable30. EquippedwithGAtt, Llama 2-Chat maintains100%accuracy,alwaysreferringtothedefined\\nattribute,andso,upto20turns(wedidnotextendthehumanevaluationmore,andalltheexampleshad\\nlessthan4048tokensintotalovertheturns). Asacomparison, Llama 2-Chat withoutGAttcannotanymore\\nrefer to the attributes after only few turns: from 100% at turn t+1, to 10% at turn t+3 and then 0%.\\nGAttZero-shotGeneralisation. Wetriedatinferencetimetosetconstrainnotpresentinthetrainingof\\nGAtt. For instance, \\u201canswer in one sentence only\\u201d, for which the model remained consistent, as illustrated in\\nFigure 28.\\nWe applied first GAtt to Llama 1 , which was pretrained with a context length of 2048 tokens and then\\nfine-tuned with 4096 max length. We tested if GAtt works beyond 2048 tokens, and the model arguably\\nmanaged to understand attributes beyond this window. This promising result indicates that GAtt could be\\nadapted as an efficient technique for long context attention.\\nA.3.6 How Far Can Model-Based Evaluation Go?\\nTo measure the robustness of our reward model, we collected a test set of prompts for both helpfulness and\\nsafety,andaskedannotatorstojudgequalityoftheanswersbasedona7pointLikert-scale(thehigherthe\\nbetter)usingtriplereviews. AsillustratedinFigure29(inAppendix),weobservethatourrewardmodels\\noverallarewellcalibratedwithhumanpreference. Notethatthisenablesustousetherewardasapoint-wise\\nmetric, despite being trained with a Pairwise Ranking Loss.\\n0.0% 2.0% 4.0% 6.0% 8.0%\\nDensity0.00.20.40.60.81.0Reward Model ScoreNo Margin\\n0.0% 2.0% 4.0% 6.0% 8.0%\\nDensity0.00.20.40.60.81.0\\nMargin Small\\n0.0% 2.0% 4.0% 6.0% 8.0%\\nDensity0.00.20.40.60.81.0\\nMargin Large\\nFigure 27: Reward model score distribution shift caused by incorporating preference rating based margin\\ninrankingloss. Withthemarginterm, weobserveabinary splitpatterninrewarddistribution, especially\\nwith a larger margin.\\n54\",\n          \"Model Size CodeCommonsense\\nReasoningWorld\\nKnowledgeReading\\nComprehensionMath MMLU BBH AGI Eval\\nMPT7B 20.5 57.4 41.0 57.5 4.9 26.8 31.0 23.5\\n30B 28.9 64.9 50.0 64.7 9.1 46.9 38.0 33.8\\nFalcon7B 5.6 56.1 42.8 36.0 4.6 26.2 28.0 21.2\\n40B 15.2 69.2 56.7 65.7 12.6 55.4 37.1 37.0\\nLlama 17B 14.1 60.8 46.2 58.5 6.95 35.1 30.3 23.9\\n13B 18.9 66.1 52.6 62.3 10.9 46.9 37.0 33.9\\n33B 26.0 70.0 58.4 67.6 21.4 57.8 39.8 41.7\\n65B 30.7 70.7 60.5 68.6 30.8 63.4 43.5 47.6\\nLlama 27B 16.8 63.9 48.9 61.3 14.6 45.3 32.6 29.3\\n13B 24.5 66.9 55.4 65.8 28.7 54.8 39.4 39.1\\n34B 27.8 69.9 58.7 68.0 24.2 62.6 44.1 43.4\\n70B37.5 71.9 63.6 69.4 35.2 68.9 51.2 54.2\\nTable3: Overallperformanceongroupedacademicbenchmarkscomparedtoopen-sourcebasemodels.\\n\\u2022Popular Aggregated Benchmarks . We report the overall results for MMLU (5 shot) (Hendrycks\\net al., 2020), Big Bench Hard (BBH) (3 shot) (Suzgun et al., 2022), and AGI Eval (3\\u20135 shot) (Zhong\\net al., 2023). For AGI Eval, we only evaluate on the English tasks and report the average.\\nAs shown in Table 3, Llama 2 models outperform Llama 1 models. In particular, Llama 2 70B improves the\\nresultsonMMLUandBBHby \\u22485and\\u22488points,respectively,comparedto Llama 1 65B.Llama 2 7Band30B\\nmodelsoutperformMPTmodelsofthecorrespondingsizeonallcategoriesbesidescodebenchmarks. Forthe\\nFalcon models, Llama 2 7B and 34B outperform Falcon 7B and 40B models on all categories of benchmarks.\\nAdditionally, Llama 2 70B model outperforms all open-source models.\\nIn addition to open-source models, we also compare Llama 2 70B results to closed-source models. As shown\\nin Table 4, Llama 2 70B is close to GPT-3.5 (OpenAI, 2023) on MMLU and GSM8K, but there is a significant\\ngaponcodingbenchmarks. Llama 2 70BresultsareonparorbetterthanPaLM(540B)(Chowdheryetal.,\\n2022)onalmostallbenchmarks. Thereisstillalargegapinperformancebetween Llama 2 70BandGPT-4\\nand PaLM-2-L.\\nWe also analysed the potential data contamination and share the details in Section A.6.\",\n          \"Figure 1: Helpfulness human evaluation results for Llama\\n2-Chatcomparedtootheropen-sourceandclosed-source\\nmodels. Human raters compared model generations on ~4k\\npromptsconsistingofbothsingleandmulti-turnprompts.\\nThe95%confidenceintervalsforthisevaluationarebetween\\n1%and2%. MoredetailsinSection3.4.2. Whilereviewing\\nthese results, it is important to note that human evaluations\\ncanbenoisyduetolimitationsofthepromptset,subjectivity\\nof the review guidelines, subjectivity of individual raters,\\nand the inherent difficulty of comparing generations.\\nFigure 2: Win-rate % for helpfulness and\\nsafety between commercial-licensed base-\\nlines and Llama 2-Chat , according to GPT-\\n4. Tocomplementthehumanevaluation,we\\nused a more capable model, not subject to\\nourownguidance. Greenareaindicatesour\\nmodelisbetteraccordingtoGPT-4. Toremove\\nties, we used win/ (win+loss). The orders in\\nwhichthemodelresponsesarepresentedto\\nGPT-4arerandomlyswappedtoalleviatebias.\\n1 Introduction\\nLarge Language Models (LLMs) have shown great promise as highly capable AI assistants that excel in\\ncomplex reasoning tasks requiring expert knowledge across a wide range of fields, including in specialized\\ndomains such as programming and creative writing. They enable interaction with humans through intuitive\\nchat interfaces, which has led to rapid and widespread adoption among the general public.\\nThecapabilitiesofLLMsareremarkableconsideringtheseeminglystraightforwardnatureofthetraining\\nmethodology. Auto-regressivetransformersarepretrainedonanextensivecorpusofself-superviseddata,\\nfollowed by alignment with human preferences via techniques such as Reinforcement Learning with Human\\nFeedback(RLHF).Althoughthetrainingmethodologyissimple,highcomputationalrequirementshave\\nlimited the development of LLMs to a few players. There have been public releases of pretrained LLMs\\n(such as BLOOM (Scao et al., 2022), LLaMa-1 (Touvron et al., 2023), and Falcon (Penedo et al., 2023)) that\\nmatch the performance of closed pretrained competitors like GPT-3 (Brown et al., 2020) and Chinchilla\\n(Hoffmann et al., 2022), but none of these models are suitable substitutes for closed \\u201cproduct\\u201d LLMs, such\\nasChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyfine-tunedtoalignwithhuman\\npreferences, which greatly enhances their usability and safety. This step can require significant costs in\\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\\nthe community to advance AI alignment research.\\nIn this work, we develop and release Llama 2, a family of pretrained and fine-tuned LLMs, Llama 2 and\\nLlama 2-Chat , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\\nLlama 2-Chat models generally perform better than existing open-source models. They also appear to\\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see\\nFigures1and3). Wehavetakenmeasurestoincreasethesafetyofthesemodels,usingsafety-specificdata\\nannotation and tuning, as well as conducting red-teaming and employing iterative evaluations. Additionally,\\nthispapercontributesathoroughdescriptionofourfine-tuningmethodologyandapproachtoimproving\\nLLM safety. We hope that this openness will enable the community to reproduce fine-tuned LLMs and\\ncontinue to improve the safety of those models, paving the way for more responsible development of LLMs.\\nWealsosharenovelobservationswemadeduringthedevelopmentof Llama 2 andLlama 2-Chat ,suchas\\nthe emergence of tool usage and temporal organization of knowledge.\\n3\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"answer\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 220,\n        \"samples\": [\n          \"Scholars performed a diagnostic analysis to investigate the AI ethics associated with ChatGPT. Their findings were compiled into a research paper that became accessible as a preprint on arXiv in January 2023.\",\n          \"The MPT 30B model demonstrates considerable proficiency in logical reasoning and reading comprehension tasks, scoring highly on LSAT-LR, LSAT-RC, and SAT-en tests compared to its peers, such as Falcon 40B and Llama 17B. This is indicative of its advanced analytical and comprehension abilities. Conversely, while Falcon 40B shows strengths in LSAT-LR with a score second only to MPT 30B, it trails in SAT-en performance. This variability underscores the diverse capabilities of models based on their structural design and training paradigms.\",\n          \"Users intending to deploy models like Llama 2 are advised to strictly adhere to guidelines laid out in the Responsible Use Guide. This includes employing enhanced safety measures at both the input and output stages of model interaction, as well as carefully tuning the model according to specific use-case requirements to prevent any potential misuse. Additionally, users must comply with the terms set in the Acceptable Use Policy, ensuring their applications do not contravene applicable laws, regulations, and ethical standards. Leveraging provided code examples can further assist developers in replicating the necessary safety protocols and maintaining ethical integrity in their applications.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 4
        }
      ],
      "source": [
        "data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DJ1eG8XvPblc"
      },
      "source": [
        "## Ingestion\n",
        "Let us now ingest the contexts in LanceDB. The steps will be:\n",
        "\n",
        "- Create a schema (Pydantic or Pyarrow)\n",
        "- Select an embedding model from LanceDB Embedding API (to allow automatic vectorization of data)\n",
        "- Ingest the contexts\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 336,
          "referenced_widgets": [
            "7d93a81fcd5f4f9c8952396a9f72be02",
            "ae8bc663ba0e44ddb830a5b50b2e92f8",
            "de0d2e4cb7b346a4ac0b55b095caff98",
            "52441756f18a4a52a2a6c839c4ff892c",
            "d143d1522f564b78a24e92bd0290f4b5",
            "a4decda69da348dfa0f0ec38c5ceb9d6",
            "1b95c82d481b4159bf7be3aefa4c0258",
            "7e49893b47174c138237e9a29584c0d0",
            "e36ba6906dd74973a48dba81ebb1f799",
            "f3e98d664e2441ad9198ee0ee947b27e",
            "fcf4a6a5fd3a45908a7727c4abefde44",
            "dec5401f1de14ba690c3e829fe4fe0ae",
            "19f5d1e903ea4f4faf62c179e7669234",
            "74ad6e3b67554d33ac6422de1e3e475a",
            "59915c4b816e420b942e6b0996c279f1",
            "f5e61067348c4b01b3c0ba09e5a52f87",
            "4a5fdf23f24c4f21896705734bc1e031",
            "3c419d422bd34cec867538615193558a",
            "4a6a4f7be2e44ade93e46702f037ebc1",
            "c460e5a56c624f43baaed1fff6aa72e2",
            "8a4e12a8f5da4f9990498c562a94116c",
            "c2253beb48cd4ce7a2eb3f1ce130f520",
            "1a1832edb11e4363af5a0c55ba013e91",
            "5e0cd1901ad444d28336c69d75b84e91",
            "1797c51500d0425496e390ebcf9729ac",
            "4993c83bb4eb429797bb9928a7c86547",
            "8f2f21ce257f457e962624bba5c3ca71",
            "c64af78fb7424710ac50c20293719123",
            "e2189d31c33f4d02983681d814d7ec28",
            "2c6fcec076ae49fb9def49c54169e0e3",
            "40782ef0cbfc4b24839015796c303869",
            "674ddce0b8cc4fad93c863469fb7496c",
            "2115bc215a574a4c90accc8e643ccc5c",
            "4a3f41c780d940eda128cc1efe82cb46",
            "b18a507ffc8e4b0ba825fdecac8980e1",
            "796ba03995264bddad91fcc999e0f073",
            "083b7d262f8945cc9b8fe928dbf9cee1",
            "aeddd98812264d939c86b026d63682f0",
            "ac96917d1e2341a5acf2b0236344b57d",
            "20b2149ba0984c23ad726cc72f21ae6d",
            "640859301f01490eb5041bd73667bdd0",
            "be70a0355abf4bed9c8b887917721879",
            "5bbbad4117bd4949adf34d58bb29d312",
            "e640e77458964a1ba16654655bbe4ee1",
            "31f10c25646949868814da02b21c8de2",
            "eac57eb6d624437b9c04cddcdf1f53f4",
            "0fb15815e4724685904aada95b00b1b5",
            "cb45a898fb1745ff94bce928c64bfab5",
            "bdacf681835145dda8867513e301f403",
            "9b49cbad496d4e9ab5af69668e842ba6",
            "6d1a94ae94e548058d34a5df5dbd563d",
            "455abd9dfd614a53b4fa35f55542a9e9",
            "878ca6976a414ba2a43e86dbf75ce45c",
            "6133892fd2da4778bc0cc08667cb2673",
            "4f30757e8c7f4b4b925d249a5369ed51",
            "7c8689c8604c4626b6092d33d80ff6cd",
            "f21f2f75e7ea42d5a6979ee722e80fe2",
            "019aa571fc8f4f40afd65f728327e0b4",
            "9b2206d9cd6d415bb4ce0d806aaaf473",
            "50f3506a5bbc4a529d20b3f85fa78260",
            "a03c9bd4a5e74a6182bbe774411899a0",
            "2da0ec344438406ca44fa75e6523867b",
            "4164c24fbe004abbbb7312da0682c8fc",
            "db17d3f6979d4ba2b8184603f226e910",
            "19d1e2c150f44fa08941117a42a2505b",
            "d23e52576ac74f3a9962ff68f786343b"
          ]
        },
        "id": "B_g5pIkBQ66h",
        "outputId": "ff31e6b0-745c-4e90-9da7-9e8c7b3c9b6f"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
            "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
            "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
            "You will be able to reuse this secret in all of your notebooks.\n",
            "Please note that authentication is recommended but still optional to access public models or datasets.\n",
            "  warnings.warn(\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "7d93a81fcd5f4f9c8952396a9f72be02"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "dec5401f1de14ba690c3e829fe4fe0ae"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "1a1832edb11e4363af5a0c55ba013e91"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "4a3f41c780d940eda128cc1efe82cb46"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "31f10c25646949868814da02b21c8de2"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "7c8689c8604c4626b6092d33d80ff6cd"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "# Define schema using Pydantic. We're using Embedding API to automatically vectorize dataset and queries\n",
        "import torch\n",
        "from lancedb.pydantic import LanceModel, Vector\n",
        "from lancedb.embeddings import get_registry\n",
        "\n",
        "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
        "embed_model = get_registry().get(\"huggingface\").create(name=\"BAAI/bge-small-en-v1.5\", device=device)\n",
        "\n",
        "class Schema(LanceModel):\n",
        "    text: str = embed_model.SourceField()\n",
        "    vector: Vector(embed_model.ndims()) = embed_model.VectorField()\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "4d4ak1nOR-da"
      },
      "outputs": [],
      "source": [
        "# Create a local lancedb connection\n",
        "import lancedb\n",
        "\n",
        "db = lancedb.connect(\"~/lancedb/\")\n",
        "tbl = db.create_table(\"qa_data\", schema=Schema, mode=\"overwrite\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NsL0jWp-Sr1h",
        "outputId": "b19c3c19-a192-4adf-cbe1-df24be0b21da"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[{'text': 'Llama 2 : Open Foundation and Fine-Tuned Chat Models\\nHugo Touvron∗Louis Martin†Kevin Stone†\\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\\nPunit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich\\nYinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra\\nIgor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\\nAlan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\\nSergey Edunov Thomas Scialom∗\\nGenAI, Meta\\nAbstract\\nIn this work, we develop and release Llama 2, a collection of pretrained and fine-tuned\\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\\nOur fine-tuned LLMs, called Llama 2-Chat , are optimized for dialogue use cases. Our\\nmodels outperform open-source chat models on most benchmarks we tested, and based on\\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosed-\\nsource models. We provide a detailed description of our approach to fine-tuning and safety\\nimprovements of Llama 2-Chat in order to enable the community to build on our work and\\ncontribute to the responsible development of LLMs.\\n∗Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com\\n†Second author\\nContributions for all the authors can be found in Section A.1.arXiv:2307.09288v2  [cs.CL]  19 Jul 2023'}, {'text': 'Contents\\n1 Introduction 3\\n2 Pretraining 5\\n2.1 Pretraining Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5\\n2.2 Training Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5\\n2.3 Llama 2 Pretrained Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7\\n3 Fine-tuning 8\\n3.1 Supervised Fine-Tuning (SFT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9\\n3.2 Reinforcement Learning with Human Feedback (RLHF) . . . . . . . . . . . . . . . . . . . . . 9\\n3.3 System Message for Multi-Turn Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16\\n3.4 RLHF Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17\\n4 Safety 20\\n4.1 Safety in Pretraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20\\n4.2 Safety Fine-Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23\\n4.3 Red Teaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28\\n4.4 Safety Evaluation of Llama 2-Chat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .'}, {'text': '. . . . . . . . 23\\n4.3 Red Teaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28\\n4.4 Safety Evaluation of Llama 2-Chat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29\\n5 Discussion 32\\n5.1 Learnings and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32\\n5.2 Limitations and Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34\\n5.3 Responsible Release Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35\\n6 Related Work 35\\n7 Conclusion 36\\nA Appendix 46\\nA.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46\\nA.2 Additional Details for Pretraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47\\nA.3 Additional Details for Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51\\nA.4 Additional Details for Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58\\nA.5 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72\\nA.6 Dataset Contamination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .'}, {'text': '. . . . . . 58\\nA.5 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72\\nA.6 Dataset Contamination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75\\nA.7 Model Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77\\n2'}, {'text': 'Figure 1: Helpfulness human evaluation results for Llama\\n2-Chatcomparedtootheropen-sourceandclosed-source\\nmodels. Human raters compared model generations on ~4k\\npromptsconsistingofbothsingleandmulti-turnprompts.\\nThe95%confidenceintervalsforthisevaluationarebetween\\n1%and2%. MoredetailsinSection3.4.2. Whilereviewing\\nthese results, it is important to note that human evaluations\\ncanbenoisyduetolimitationsofthepromptset,subjectivity\\nof the review guidelines, subjectivity of individual raters,\\nand the inherent difficulty of comparing generations.\\nFigure 2: Win-rate % for helpfulness and\\nsafety between commercial-licensed base-\\nlines and Llama 2-Chat , according to GPT-\\n4. Tocomplementthehumanevaluation,we\\nused a more capable model, not subject to\\nourownguidance. Greenareaindicatesour\\nmodelisbetteraccordingtoGPT-4. Toremove\\nties, we used win/ (win+loss). The orders in\\nwhichthemodelresponsesarepresentedto\\nGPT-4arerandomlyswappedtoalleviatebias.\\n1 Introduction\\nLarge Language Models (LLMs) have shown great promise as highly capable AI assistants that excel in\\ncomplex reasoning tasks requiring expert knowledge across a wide range of fields, including in specialized\\ndomains such as programming and creative writing. They enable interaction with humans through intuitive\\nchat interfaces, which has led to rapid and widespread adoption among the general public.\\nThecapabilitiesofLLMsareremarkableconsideringtheseeminglystraightforwardnatureofthetraining\\nmethodology. Auto-regressivetransformersarepretrainedonanextensivecorpusofself-superviseddata,\\nfollowed by alignment with human preferences via techniques such as Reinforcement Learning with Human\\nFeedback(RLHF).Althoughthetrainingmethodologyissimple,highcomputationalrequirementshave\\nlimited the development of LLMs to a few players. There have been public releases of pretrained LLMs\\n(such as BLOOM (Scao et al., 2022), LLaMa-1 (Touvron et al., 2023), and Falcon (Penedo et al., 2023)) that\\nmatch the performance of closed pretrained competitors like GPT-3 (Brown et al., 2020) and Chinchilla\\n(Hoffmann et al., 2022), but none of these models are suitable substitutes for closed “product” LLMs, such\\nasChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyfine-tunedtoalignwithhuman\\npreferences, which greatly enhances their usability and safety. This step can require significant costs in\\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\\nthe community to advance AI alignment research.\\nIn this work, we develop and release Llama 2, a family of pretrained and fine-tuned LLMs, Llama 2 and\\nLlama 2-Chat , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\\nLlama 2-Chat models generally perform better than existing open-source models. They also appear to\\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see\\nFigures1and3). Wehavetakenmeasurestoincreasethesafetyofthesemodels,usingsafety-specificdata\\nannotation and tuning, as well as conducting red-teaming and employing iterative evaluations. Additionally,\\nthispapercontributesathoroughdescriptionofourfine-tuningmethodologyandapproachtoimproving\\nLLM safety. We hope that this openness will enable the community to reproduce fine-tuned LLMs and\\ncontinue to improve the safety of those models, paving the way for more responsible development of LLMs.\\nWealsosharenovelobservationswemadeduringthedevelopmentof Llama 2 andLlama 2-Chat ,suchas\\nthe emergence of tool usage and temporal organization of knowledge.\\n3'}]\n"
          ]
        }
      ],
      "source": [
        "contexts = [\n",
        "    {\"text\": context} for context in data[\"context\"].unique()\n",
        "]\n",
        "print(contexts[0:5])\n",
        "tbl.add(contexts)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tqtcoHDBZsvy"
      },
      "source": [
        "## Different Query types in LanceDB\n",
        "LanceDB allows switching query types with by setting `query_type` argument, which defaults to `vector` when using Embedding API. In this example we'll use `JinaReranker` which is one of many rerankers supported by LanceDB.\n",
        "\n",
        "### Vector search:\n",
        "Vector search\n",
        "\n",
        "```\n",
        "table.search(query, query_type=\"vector\")` or `table.search(query)\n",
        "```\n",
        "\n",
        "Vector search with Reranking\n",
        "```\n",
        "reranker = JinaReranker()\n",
        "table.search(query).rerank(reranker=reranker)\n",
        "```\n",
        "\n",
        "\n",
        "### Full-text search:\n",
        "FTS\n",
        "\n",
        "```\n",
        "table.search(query, query_type=\"fts\")\n",
        "```\n",
        "\n",
        "### FTS with Reranking\n",
        "```\n",
        "table.search(query, query_type=\"fts\").rerank(reranker=reranker)\n",
        "```\n",
        "\n",
        "### Hybrid search\n",
        "```\n",
        "table.search(query, query_type=\"hybrid\").rerank(reranker=reranker)\n",
        "```\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "8kII4ZFsQ2sm"
      },
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "Util for searching lancedb table with different query types and rerankers. In case of Vector and FTS only reranking, we'll overfetch the results\n",
        "by a factor of 2 and get top K after reranking. Without overfetching, vector only and fts only search results won't have any effect on hit-rate metric\n",
        "\"\"\"\n",
        "from lancedb.rerankers import Reranker\n",
        "\n",
        "VALID_QUERY_TYPES = [\"vector\", \"fts\", \"hybrid\", \"rerank_vector\", \"rerank_fts\"]\n",
        "\n",
        "def search_table(table: lancedb.table, reranker:Reranker, query_type: str, query_string: str, top_k:int=5, overfetch_factor:int=2):\n",
        "    if query_type not in VALID_QUERY_TYPES:\n",
        "        raise ValueError(f\"Invalid query type: {query_type}\")\n",
        "    if query_type in [\"hybrid\", \"rerank_vector\", \"rerank_fts\"] and reranker is None:\n",
        "        raise ValueError(f\"Reranker must be provided for query type: {query_type}\")\n",
        "\n",
        "    if query_type in [\"vector\", \"fts\"]:\n",
        "        rs = table.search(query_string, query_type=query_type).limit(top_k).to_pandas()\n",
        "    elif query_type == [\"rerank_vector\", \"rerank_fts\"]:\n",
        "        rs = table.search(query_string, query_type=query_type).rerank(reranker=reranker).limit(overfetch_factor*top_k).to_pandas()\n",
        "    elif query_type == \"hybrid\":\n",
        "        rs = table.search(query_string, query_type=query_type).rerank(reranker=reranker).limit(top_k).to_pandas()\n",
        "\n",
        "    return rs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3J40ks64Izdm"
      },
      "source": [
        "## Hit-rate eval metric\n",
        "\n",
        "We'll be using a simple metric called `\"hit-rate\"` for evaluating the performance of the retriever across this guide.\n",
        "\n",
        "Hit-rate is the percentage of queries for which the retriever returned the correct answer in the top-k results.\n",
        "\n",
        "For example, if the retriever returned the correct answer in the top-3 results for 70% of the queries, then the hit-rate@3 is 0.7."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "WkhIp-1lJD79"
      },
      "outputs": [],
      "source": [
        "import tqdm\n",
        "\n",
        "def hit_rate(ds, table, query_type:str, top_k:int = 5, reranker:Reranker = None) -> float:\n",
        "    eval_results = []\n",
        "    for idx in tqdm.tqdm(range(len(ds))):\n",
        "        query = ds[\"query\"][idx]\n",
        "        reference_context = ds[\"context\"][idx]\n",
        "        if not reference_context:\n",
        "            print(\"reference_context is None for query: {idx}. \\\n",
        "                            Skipping this query. Please check your dataset.\")\n",
        "            continue\n",
        "        try:\n",
        "            rs = search_table(table, reranker, query_type, query, top_k)\n",
        "        except Exception as e:\n",
        "            print(f'Error with query: {idx} {e}')\n",
        "            eval_results.append({\n",
        "                'is_hit': False,\n",
        "                'retrieved': [],\n",
        "                'expected': reference_context,\n",
        "                'query': query,\n",
        "            })\n",
        "            continue\n",
        "        retrieved_texts = rs['text'].tolist()[:top_k]\n",
        "        expected_text = reference_context[0] if isinstance(reference_context, list) else reference_context\n",
        "        is_hit = False\n",
        "\n",
        "        # HACK: to handle new line characters added my llamaindex doc reader\n",
        "        if expected_text in retrieved_texts or expected_text+'\\n' in retrieved_texts:\n",
        "            is_hit = True\n",
        "        eval_result = {\n",
        "            'is_hit': is_hit,\n",
        "            'retrieved': retrieved_texts,\n",
        "            'expected': expected_text,\n",
        "            'query': query,\n",
        "        }\n",
        "        eval_results.append(eval_result)\n",
        "\n",
        "    result = pd.DataFrame(eval_results)\n",
        "    hit_rate = result['is_hit'].mean()\n",
        "    return hit_rate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iZzAVl2kJ5mV",
        "outputId": "0f4d6e5b-3096-4f58-fc36-c7909b475cfc"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 220/220 [00:10<00:00, 21.62it/s]\n",
            "100%|██████████| 220/220 [00:00<00:00, 358.03it/s]"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            " Vector Search Hit Rate: 0.6409090909090909\n",
            "FTS Search Hit Rate: 0.5954545454545455\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\n"
          ]
        }
      ],
      "source": [
        "tbl.create_fts_index(\"text\", replace=True)\n",
        "hit_rate_vector = hit_rate(data, tbl, \"vector\")\n",
        "hit_rate_fts = hit_rate(data, tbl, \"fts\")\n",
        "print(f\"\\n Vector Search Hit Rate: {hit_rate_vector}\")\n",
        "print(f\"FTS Search Hit Rate: {hit_rate_fts}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "2. Reranked vector search\n"
      ],
      "metadata": {
        "id": "-1B5OPDuI8NE"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "ngbS5kvnI6N_"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Efmb9Gi2s9lD"
      },
      "source": [
        "## Hybrid Search\n",
        " <img src=\"https://blog.lancedb.com/content/images/2024/02/1_Zh4Jju6uiCYFO9HHvO5sIA.webp\" />\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ydLNeAr4acYj",
        "outputId": "0e455b2f-a10c-4ad2-ce36-52a90829dd10"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 220/220 [00:10<00:00, 20.60it/s]"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            " Hybrid Search with LinearCombinationReranker Hit Rate: 0.6454545454545455\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\n"
          ]
        }
      ],
      "source": [
        "from lancedb.rerankers import LinearCombinationReranker # LanceDB hybrid search uses LinearCombinationReranker by default\n",
        "\n",
        "reranker = LinearCombinationReranker(weight=0.7)\n",
        "hit_rate_hybrid = hit_rate(data, tbl, \"hybrid\", reranker=reranker)\n",
        "\n",
        "print(f\"\\n Hybrid Search with LinearCombinationReranker Hit Rate: {hit_rate_hybrid}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Wswq157ptjTZ"
      },
      "source": [
        "## Trying out different rerankers\n",
        "\n",
        "### 1. Cross Encoder Reranker\n",
        "\n",
        " <img src=\"https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/Bi_vs_Cross-Encoder.png\" />\n",
        "\n",
        "Bi-Encoders produce for a given sentence a sentence embedding. We pass to a BERT independently the sentences A and B, which result in the sentence embeddings u and v. These sentence embedding can then be compared using cosine similarity.\n",
        "\n",
        "In contrast, for a Cross-Encoder, we pass both sentences simultaneously to the Transformer network. It produces then an output value between 0 and 1 indicating the similarity of the input sentence pair:\n",
        "\n",
        "A Cross-Encoder does not produce a sentence embedding. Also, we are not able to pass individual sentences to a Cross-Encoder."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 248,
          "referenced_widgets": [
            "d718ecc1163942e4a5dd4cea73a90e8e",
            "b33b19b3d1984052a7f189045a8cf881",
            "ec4f2cb69b034876a42404016cd56336",
            "299a94f5d724418cbfc3ff570f6fc51e",
            "09d3de90349c4402a763ca9ee05872f1",
            "f074dee7a025499ab26a16b811967b0d",
            "5dc2560eb8d441c5b3c19f4dbb082402",
            "4d055c5e91a14c789d53c41027c13f79",
            "45a86acf89d74e88b05b236064abfe9e",
            "d618cebde90545fe9d99255511dd842e",
            "8ab0e516471a4b82b205227692a9c08c",
            "77ed41da01bd4032aef1ad5d471e49b8",
            "fd45203f991d420cbca1de00404fb92d",
            "270e8fc6c4ed4583ad78e53a6048af39",
            "4e8e3b7d32a542cc8b8c60bddf03b2fd",
            "a72c96893f65478f8a618c6bed76a5a6",
            "d6f67e57cbf64403844ac492fe33a37c",
            "e71a9e5429a24743bb2f6672a675a0ea",
            "8909f3170d344efdb39f5d95cc388606",
            "54658b4b7ab543209f4de19a0b7c7477",
            "a11e3a6b07114d56bb7ea4ff54f2dfba",
            "036c0495ef404234a99a0d0945bfb137",
            "481da1832f694ef6ac05e4d3efd67ac2",
            "3ddd20f3287d4c53b9a8aa9dc376cf3b",
            "2d23114ea17646069d8d228775f503a2",
            "1583bd5518554acdb4201da0262dce80",
            "cc1ca1cd82864701b376cb77f62bb189",
            "6c95c1d838794b81a0ca58d97fc4d4bd",
            "e7d30ffd3a5d4029bc94371e89d39df2",
            "063e9d8117864292a1f4f7db6bf39fc5",
            "4098367692a34e59a5f7875546187471",
            "9fe27f7c08fd4e4690a0ed2a289176ee",
            "e48ed4e228804a56af1b995b1533eda7",
            "bc0b3747df404a2ea47be94b90b1bd96",
            "17d0c553a19d4b55a03a95b68d24734f",
            "c11f882bb2e04059b92006609471be1c",
            "bfff2a4f749c46aaa2c1a0a131f13ca4",
            "72cb183fdc3c444d85a08aa378e48a78",
            "7fab79549d3a4bc2a2a38e761c85e3bd",
            "48237a9c806440e0b100f52445502db7",
            "0f48274c81ee44bb807a7d75ee0762fd",
            "fc242de73006403886268a2ca9913375",
            "387599dd3ff9491a8527a3a90e612c82",
            "0f1a6b08ce6747c383c080ccd8c7783a",
            "b6448167f80b40508070d3aa8cbc2ea0",
            "24f579f32a3f48559ecc9b38f39f77cd",
            "6e58fc22c84c48ba9d38770bf6665ee2",
            "be452568af594df5884cbebbb23b1a47",
            "6db85f74ef4c4edb8667d373bd3f96a4",
            "be30e00481f24011a444511c896336d3",
            "2d14ed52a93d4299b5de69c89e74ce13",
            "7e196a41516f4550aa0b94d042d59756",
            "4ba2b40b6b7945dfba573c9f80465155",
            "7e761767401c4d9b97c06237b1e3b6eb",
            "e5765e9cf70240c4ac11bdce2d1eaac8"
          ]
        },
        "id": "dd0jh4gNtm41",
        "outputId": "aa734304-533a-4061-f6e4-17ee167b1933"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\r  0%|          | 0/220 [00:00<?, ?it/s]"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "d718ecc1163942e4a5dd4cea73a90e8e"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "77ed41da01bd4032aef1ad5d471e49b8"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "tokenizer_config.json:   0%|          | 0.00/541 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "481da1832f694ef6ac05e4d3efd67ac2"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "bc0b3747df404a2ea47be94b90b1bd96"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "b6448167f80b40508070d3aa8cbc2ea0"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 220/220 [01:03<00:00,  3.44it/s]"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            " \n",
            " Hybrid Search with CrossEncoderReranker Hit Rate: 0.6772727272727272\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\n"
          ]
        }
      ],
      "source": [
        "#WARNING:  This cell takes a long time without CUDA\n",
        "from lancedb.rerankers import JinaReranker, CrossEncoderReranker, CohereReranker\n",
        "\n",
        "reranker = CrossEncoderReranker()\n",
        "hit_rate_hybrid = hit_rate(data, tbl, \"hybrid\", reranker=reranker)\n",
        "print(f\" \\n Hybrid Search with CrossEncoderReranker Hit Rate: {hit_rate_hybrid}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3lhePSeMQN-p"
      },
      "source": [
        "2. Jina AI Reranker\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "O4L0Lqi2tvZn",
        "outputId": "8ccb169c-0632-4ee0-c039-cac29956af08"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 220/220 [01:24<00:00,  2.60it/s]"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            " \n",
            " Hybrid Search with JinaReranker Hit Rate: 0.7681818181818182\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\n"
          ]
        }
      ],
      "source": [
        "# Jina AI Reranker\n",
        "import os\n",
        "from lancedb.rerankers import JinaReranker\n",
        "\n",
        "# Colab secret setup\n",
        "from google.colab import userdata\n",
        "os.environ[\"JINA_API_KEY\"] = userdata.get('JINA_API_KEY')\n",
        "\n",
        "reranker = JinaReranker(model_name=\"jina-reranker-v2-base-multilingual\")\n",
        "hit_rate_hybrid = hit_rate(data, tbl, \"hybrid\", reranker=reranker)\n",
        "print(f\" \\n Hybrid Search with JinaReranker Hit Rate: {hit_rate_hybrid}\")"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "os.environ[\"COHERE_API_KEY\"] = userdata.get('COHERE_API_KEY')\n",
        "\n",
        "reranker = CohereReranker()\n",
        "hit_rate_hybrid = hit_rate(data, tbl, \"hybrid\", reranker=reranker)\n",
        "print(f\" \\n Hybrid Search with CohereReranker Hit Rate: {hit_rate_hybrid}\")"
      ],
      "metadata": {
        "id": "n6VZEU9-HnDp"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N_nRDYnVOKKo"
      },
      "source": [
        "## All results:\n",
        "\n",
        "|  Query Type| Hit-rate@5 |\n",
        "| --- | --- |\n",
        "| Vector |  0.640 |\n",
        "| FTS   |  0.595  |\n",
        "| Reranked vector (Cohere Reranker) | 0.677    |\n",
        "| Reranked fts (Cohere Reranker)  | 0.672    |\n",
        "| Hybrid (Cohere Reranker) | 0.759 |\n",
        "| Hybrid (Jina Reranker) | 0.768 |\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0ilrLMWkwK3o"
      },
      "source": [
        "## Results on other datasets\n",
        "\n",
        "### SQuAD Dataset\n",
        " [TODO]\n",
        "\n",
        "\n",
        "### Uber10K sec filing Dataset\n",
        "\n",
        "| Query Type | Hit-rate@5 |\n",
        "| --- | --- |\n",
        "| Vector |  0.608 |\n",
        "| FTS   |  0.824  |\n",
        "| Reranked vector | 0.671    |\n",
        "| Reranked fts  | 0.843    |\n",
        "| Hybrid | 0.849 |\n",
        "\n",
        "\n",
        "### Full text search is generally a good baseline!\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0nkqd-K0TF73"
      },
      "source": [
        "## Implementing Custom `Rerankers` with LanceDB\n",
        "\n",
        "LanceDB\n",
        "\n",
        "```\n",
        "from lancedb.rerankers import Reranker\n",
        "import pyarrow as pa\n",
        "\n",
        "class MyReranker(Reranker):\n",
        "    def __init__(self, param1, param2, ..., return_score=\"relevance\"):\n",
        "\n",
        "\n",
        "    def rerank_hybrid(self, query: str, vector_results: pa.Table, fts_results: pa.Table):\n",
        "        # Use the built-in merging function\n",
        "        combined_result = self.merge_results(vector_results, fts_results)\n",
        "\n",
        "        # Do something with the combined results\n",
        "        return combined_result\n",
        "\n",
        "    def rerank_vector(self, query: str, vector_results: pa.Table):\n",
        "        # Do something with the vector results\n",
        "        return vector_results\n",
        "\n",
        "    def rerank_fts(self, query: str, fts_results: pa.Table):\n",
        "        # Do something with the FTS results\n",
        "        return fts_results\n",
        "\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jKu93mPSQQ6P"
      },
      "source": [
        "## Takeaways & Tradeoffs\n",
        "\n",
        "* **Rerankers significantly improve accuracy at little cost.** Using Hybrid search and/or rerankers can significantly improve retrieval performance without spending any additional time or effort on tuning embedding models, generators, or dissecting the dataset.\n",
        "\n",
        "* **Reranking is an expensive operation.** Depending on the type of reranker you choose, they can incur significant latecy to query times. Although some API-based rerankers can be significantly faster.\n",
        "\n",
        "* **Pre-warmed GPU environments reduce latency.**  When using models locally, having a warmed-up GPU environment will significantly reduce latency. This is especially useful if the application doesn't need to be strictly realtime. Pre-warming comes at the expense of GPU resources."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OtldQkhHe_fB"
      },
      "source": [
        "## Applications\n",
        "* **Not all recommendation problems are strictly real-time.** When considering problem statements involving chatbots, search recommendations, auto-complete etc. low latency is a hard requirement.\n",
        "\n",
        "* But there another category of applications where retrieval accurate information need not be real-time. For example:\n",
        "\n",
        "  1.  **Personalized music or movie recommendation**:\n",
        "    These systems generally start off by recommending close to random / or some generally accurate recommendations. They keep improving recommendations **async** with the user interation data.\n",
        "    <img src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/f3-two-encoders.max-700x700.png\" />\n",
        "\n",
        "  \n",
        "  2.  **Social media personalised timeline**\n",
        "  \n",
        "\n",
        "  3. **Recommend blogs, videos, etc. via push notifications**\n",
        "\n",
        "   \"YouTube now gives notifications for \"recommended\", non-subscribed channels\" - https://www.reddit.com/r/assholedesign/comments/807zpe/youtube_now_gives_notifications_for_recommended/\n",
        "   \n",
        "   <img src=\"https://preview.redd.it/q8uq4e4bpfi01.png?auto=webp&s=95d1a3c1d05971d8040bb650e4287518b3a0f312\" />\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PJ0qSdCgCGi4"
      },
      "source": []
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
